Cryptography based on neural networks - analytical results 
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Mutual learning process between two parity feed-forward networks with discrete and continuous 
weights is studied analytically, and we find that the number of steps required to achieve full syn- 
chronization between the two networks in the case of discrete weights is finite. The synchronization 
process is shown to be non-self-averaging and the analytical solution is based on random auxiliary 
variables. The learning time of an attacker that is trying to imitate one of the networks is examined 
analytically and is found to be much longer than the synchronization time. Analytical results are 
found to be in agreement with simulations. 
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The study of neural networks was originally driven by 
its potential as a powerful learning and memory machine. 
Statistical mechanics methods have been used to analyze 
the network's ability and explore its limitations 0, . In 
a recent paper |3| , the bridge between the theory of neu- 
ral networks and cryptography was established. It was 
shown numerically that two randomly initialized neural 
networks with one layer of hidden units (so called Par- 
ity Machines (PMs)) learning from each other, are able 
to synchronize. The two parties have common inputs 
and they exchange information about their output. In 
the case of disagreement, the two PMs are trained by 
the Hebbian learning rule on their mutual outputs and 
they develop a full synchronized state of their synaptic 
weights. This synchronization procedure can be used to 
construct an ephemeral key exchange protocol for the se- 
cure transmission of secret data. An attacker, who knows 
the architecture of the two parties, the common inputs, 
and observes the mutual exchange of information, finds it 
difficult to imitate the moves of the parties and to reveal 
the common parameters after synchronization. All par- 
ties have secret informations which are not known neither 
to other members nor to possible attackers: Their initial 
weights and the current state of their hidden units, which 
we are noted as internal representations (IRs) 

During the last decade, the analysis of learning from 
examples performed by feed-forward multi-layered net- 
works was exhaustively examined using statistical me- 
chanics methods jl], 0|. An interesting network belong- 
ing to this class is the tree PM which is characterized 
by a superior capacity, as was found by replica calcula- 
tions 0] . The study of the generalization ability of such 
networks was based on a set of training examples gener- 
ated by a static teacher network. Here we discuss a case 
where two or several multilayer networks are trained by 
their mutual outputs. This scenario has been solved only 
for perceptrons and only for continuous ones @. Here 



we present an analytic solution for PMs with continuous 
as well as with discrete weights. 

In our cryptosystem, each party in the secure channel is 
represented by a feed-forward network consisting of KN 
random input elements xji = ±1, j = 1, N, K binary 
hidden units Tj = ±l,i = l,...,-ftT and one binary output 
unit a = TliTi . For the simplicity of the calculations pre- 
sented below we concentrate only on the case of a tree 
PM with 3 binary hidden units feeding a binary output 
a = T1T2T3. The hidden units are determined via Boolean 
functions = sgn(^ • WjiXji) through three disjointed 
sets of inputs Xj = xu, ...,XNi- The weights are either 
discrete or continuous, and the analytical results are de- 
rived for N > 1. 

In this Letter we present: (a) An analytical solution of 
the mutual learning of two PMs whose weight vectors are 
updated according to the mismatch between their mutual 
information - their outputs. Synchronization is achieved 
in the case of discrete weights, Wj% = 0, ±1,...,±L, 
as well as for continuous weights confined to a sphere, 
^2f=i Wji = (b) Analysis of online adaptation of 
discrete weights, in which each change of a component 
is not infinitesimally small, demands different methods 
than the standard ones B, and this is at the center of 
the discussion below. Surprisingly, synchronization is 
achieved for the discrete weights at a finite number of 
steps, (c) Dynamical evolution of the discrete networks 
cannot be characterized by the time evolution of the stan- 
dard order parameters, since the overlaps between the 
weight-vectors are not self averaging H even for large 
networks. The analytical solution is based on calculation 
of the evolution of the distribution of the order param- 
eters as a function of the initial set of the weights, (d) 
The analysis is extended to include a possible attacker. 

For simplicity of presentation, we first describe the an- 
alytical methods developed for the discrete case where de- 
tailed results are presented for particular examined cases. 
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At the end of this Letter results for the continuous case 
are also briefly summarized. 

The definition of the updating procedure between the 
two parties, A and B, that are trying to synchronize their 
weights, is as follows. In each time step, output of the 
two parties is calculated for a common random input. 
Only weights belonging to the one (or three) hidden units 
which are equal to their output unit are updated, in each 
one of the two parties. The updating is done according 
to the following Hebbian learning rule, 

W J ^ = W^ + K(Wf i x ji a B )x ji a B 0(a\ A )9(-a A a B ),(l) 
Wf+ = Wf i +K(W? i x ji v A )x ji * A e(a B T?'M-a A * B ), 

where K (y) = 1 — 8l, v and 5 represents the Kronecker 
function. The purpose of the operator K (y) is to prevent 
the increment (decrement) of the strength of the weights 
on the boundary value L(—L). 

Two important simulation results are crucial for the 
analytical description of the mutual dynamics. The first 
observation is that the synchronization time is finite ||. 
The second is that different runs (set of random inputs) 
of the above dynamics, but with fixed initial conditions 
for the two parties, result in different sets of IRs. As 
a result of these two observations, we realized that the 
variance of the overlaps between the two parties is finite 
and does not shrink to zero even in the thermodynamic 
limit. This unusual scenario of on-line mutual learning is 
taken into consideration in the analytical equations, by 
the selection of random IRs following the freedom given 
by the current analytical overlaps. We find an iterative 
discrete set of equations for the mutual overlaps between 
the parties, whose evolution depend on some random but 
correlated ingredients - the curr ent IRs, {r^j^rf } (see 
Eq. @). 

In each time step, p, the mutual state of the two parties 

is defined by a (2L + 1) x (2L + 1) matrix, F l (p), where 
i represents the hidden unit. The element f l qr of the 
matrix stands for the fraction of components in the ith 
weight-vector which are equal to q(r) in the first(second) 
party, where q,r = 0, ±1,...,±L. The overlap of the 
weights belonging to the ith hidden unit in the two 
parties, Rf' B = Wf ■ ~W B /N, as well as their norms, 
Qi = Wj ■ Wj/N, are given by the matrix elements 

q.r q.r q,r 

These overlaps and norms fixed the probabilities of dc- 
riving the same IR via the normalized overlap, p i ' — 
Rf' B / \J Q A Qf ■ More precisely, the probability of hav- 
ing different results in the ith hidden unit of the two 
parties is given by the well known generalization error 
for the perceptron e l p — cos -1 pi/n [jl], |j. 

Each of the PM consists of a tree architecture and for 
random inputs each of the 8 IRs appears with equal prob- 



ability. The joint probability distribution of the 64 dif- 
ferent pairs of IRs in both parties is correlated, and can 
be explicitly expressed using {e l p }. 

The development of the elements of the matrix F % {p) 
are calculated directly from Eq. [l], where one has to 
average over the inputs Xij. We use auxiliary random 
variables in order to choose one of the possible IRs fol- 
lowing their probabilities given by {e p }. In each step 
we choose two sets of random numbers which are taken 
from a flat distribution between and 1: Set I: In the 
event that the number is smaller than we deduce that 
the two hidden units disagree, otherwise we assume an 
agreement. Set II: All eight IRs are equally probable in 
the first party, since the architecture consists of a tree 
PM. We choose one among the eight using the second set 
of auxiliary variables p r , and the corresponding IR for 
the second network according to the first set. 

To exemplify derivation of the iterative equations for 
{fg r }, let us concentrate on the case where the result of 
the first random set is that all three hidden units are in 
disagreement. In two possibilities out of the eight IRs all 
three hidden units are updated, whereas in the other six 
possibilities only one is updated (we then have to choose 
at random one among the three). After taking into ac- 
count all possible internal scenarios, and accordingly the 
updates, one can show that the iterative equations for 
{fq r } away from the boundary, q,r ^ ±L, are given by 

Pqtr = 9{\-Pr){\fU.x, T -l + \f q - W ) + 
0(i±± -Pr)9(p r - \){\f q+1 .r-l + ^-l,r+l)- 

On the boundary, similar equations can be derived as 
well as for other internal scenarios. Taking into account 
all possible scenarios and the inversion symmetry of our 
PMs, one has to solve iteratively only 4 classes of equa- 
tions in a manner similar to the abovementioned || . Note 
that the time evolution of the f qr and the overlaps de- 
pends on time dependent random variables. 

Different runs for updating of the equations result in 
different trajectories of the order parameters. In the inset 
of Fig. ||, we present the average overlap p = J2^=i Pi/^i 
and its standard deviation, obtained from 500 different 
runs of the analytical equations with L = 1. Results of 
the averaged overlap (with the same standard deviation) 
obtained in 500 runs of simulations with N = 10 4 are 
denoted by circles. 

An important quantity is the number of steps required 
to achieve full synchronization, t sync h 7 since it can be 
used by the parties to encrypt /decrypt the information 
using the known output bit. In simulations the synchro- 
nization time is well defined - the first step in which 
all weight vectors of the parties are in an anti-parallel 
state. In contrast, in the analytical solution the aver- 
age overlap of the hidden units tends to zero exponen- 
tially with the number of steps. In order to compare 
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FIG. 1: The histogram of the t sync h (solid line) and ti earn 
(dashed line) as was obtained in different runs of the dis- 
crete iterative equations for PMs with L = 1. Symbols stand 
for simulation results, N = 10000 based on 500 runs. Inset: 
numerical results of p as a function of the number of steps. 
Analytical results (solid line) and simulations results (circles) 
include the standard deviation obtained from 500 different 
runs. 



analytical results to simulations we need to find a cri- 
terion which determines synchronization. We chose the 
criterion p < — q = — 1 + 0.1/ {NL) to define full syn- 
chronization, since q is much greater than the maximal 
possible overlap just before synchronization. 

The exponential decay of the overlaps with the number 
of steps and the claim that synchronization is achieved at 
a finite number of steps even for N 1 has to be clari- 
fied. Our synchronization process is mainly characterized 
by two regimes: The first t a steps which are characterized 
by different IRs (in some of the steps) for the two parties. 
Note that t a is fluctuating from sample to sample. The 
second is the asymptotic regime, last if, steps, where the 
IRs of the parties are always the same, and the weights 
are converging to an anti-parallel state similarly to three 
perceptrons, if, oc \og(N) [|J. Roughly speaking, the two 
regimes are characterized by ef > l/t a and ef < 1/tb, 
respectively. Our analytical results as well as simulations 
indicate that t a is independent of N. Hence as long as 
t a > tf,, the \og(N) dependent is invisible. For L = 3, for 
instance, t sync h ~ 400, t a ~ 300, and if, is expected to be 
equal to t a only for N ~ e 200 . 

In Fig. p], we present the histogram of the number of 
steps required to achieve t sync h, P{t syn ch), in simulations 
with N = 10 4 and L = 1 and the initial weights were 
chosen such that p.' =0. This distribution is in a 
fairly good agreement with the results obtained by the 
runs of the iterative equations for f qr . 

Let us now examine a possible attack of a third player, 
an attacker o, that tries to imitate one of the parties (let 



us say A). We assume that the attacker uses the same 
algorithm as one of the partners. The attacker updates 
its own weight-vectors only when an updating step is 
taken by the parties. The natural move of an attacker in 
such an event is to follow the rule of the parties 

W°+ = W° z +K(W° lX]l * B )x 3l <T B 9(a A T°)9(-<J A a B ), 

indicating that only weight-vectors belonging to the hid- 
den units which are in agreement with the output of 
party A are updated, ( more advanced attacks will be 
discussed elsewhere || ) . The evolution of the overlap of 
an attacker depends on the evolution of 6 matrices; three 
matrices describing the overlaps between the parties and 
similarly, three matrices describing the overlaps between 
the attacker and the first party. Note that the dynamics 
of the attacker depends on moves of the parties which 
depend on their overlaps. Hence, the time evolution of 
six matrices gives the full description of the overlaps be- 
tween the attacker and the first party and between the 
parties themselves. The mutual dynamics of the three 
networks, two parties and the attacker, depends on the 
joint probability distribution of 8 x 8 x 8 IRs, and upon 
the corresponding updates of the six matrices. The full 
description of the discrete time evolution of the matrices 
and the overlaps will be given elsewhere [g . 

The analytical solution of the dynamics of the attacker 
indicates that a full learning is achieved in a finite number 
of steps, tiearn j where a full learning is defined such that 
~p A '° > q. In Table ti earn and t sync h are compared for 
various L. 





tsynch 


tiearn 


r 


L=l 


61 ± 10 


1.1 ■ 10 2 ±0.2 ■ 10 2 


1.8 ±0.6 


L=2 


188 ± 26 


1.5 ■ 10 3 ±0.5 ■ 10 3 


8.0 ±2.9 


L=3 


376 ± 51 


4.5 ■ 10 4 ± 1.3 ■ 10 4 


120 ±51 


L=4 


673 ± 95 


6.9- 10 7 ±5.7- 10 7 


1.04 ■ 10 5 ± 1.02 ■ 10 5 



TABLE I: The average synchronization time, t sync h, the av- 
erage learning time ti ea m, their standard deviation and the 
ratio ti eo ,rn /tsynch averaged over 2000 different runs of the 
iterative equations with the halting criterion cj = 1 — 10~ 5 . 



For L = 1 the average learning time is about twice 
the synchronization time, and one may reach the wrong 
conclusion that the synchronization process always ter- 
minates before the learning process. In Fig. [j]we present 
the histogram of the synchronization and the learning 
processes, and a fairly good fit between analytical and 
simulation results is apparent. The two distributions, 
P(t sync h), P {tieam) have a finite overlap, indicating that 
in a finite fraction of the runs the learning process termi- 
nates before the achievement of synchronization (which 
was indeed observed in a finite fraction of the runs of the 
simulations). Hence the construction with L = 1 is not 
a good candidate to build a secure channel. 
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FIG. 2: The distribution of r = ti earn /t sync h for L = 3 
obtained from the analytical solution of about 1200 runs. The 
lowest value obtained for r was ~ 6. Inset: The average 
overlaps ~p~ A ' B (solid line) and ~p A '° (dashed line) as a function 
of a for PMs with continuous weights and r\ = 3 are presented. 
Symbols stand for simulation results with N = 5000 and error 
bars are smaller than the symbols. 
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and similarly the updating rules for the second party and 
the attacker. The analytical calculation can be simplified 
in the continuous case by the probability that there is 
a mismatch between the two PMs given that there is 



a mismatch between two hidden units, P[ = P(a A ^ 
; ) = efep + (1 - e£)(l - £p) and similarly 



Pi = P{a A 



B W A 



= Tf) = 1 — PI- One can map 



the mutual process onto that of perceptrons, where the 
updating of the first party for instance, is given by 



W 



A+ - <yVf + %XiT B Af)/\Wt + ^r B Af | 



and similarly for the 

0(-r l A Tf)9(§-p a ) + 



P 



For L > 3 the ratio r = ti earn /t sync h averaged over the 
runs was found to be r > 1 (see Table ). For L = 3, 
we did not observe, in simulations over 10 5 runs, a case 
where ti earn was faster than t sync h- In Fig. ^ we present 
the histogram of the probabilities of the ratio, r, as was 
found by averaging over different runs of the analytical 
equations. The minimal value of the ratio was r ~ 6 
where the largest ratio was r ~ 680. We found that 
the largest synchronization times are smaller than 1000 
whereas the typical learning time is 4.5 • 10 4 . 

Synchronization in the case of PMs with continuous 
weights is achievable only with the following modifica- 
tions, (a) Normalization of the weight vectors belong- 
ing to each one of the hidden units after every updating 
step. The natural normalization we use is the spherical 
normalization, J2f=i ^ji — N. (b) The change in the [1] 
strength of each weight (before normalization) is rj/N, 
where 77 is a constant of order one. The synchroniza- 
tion time is proportional to the size of the input, N, and 
therefore the analytical description of the system is given 
by a coupled differential equations. Some limited results 
and brief description of the method are presented below. 
More detailed results will be given elsewhere ^ . 

Updating of weights of the first party for the spherical 
case is given by 



second party, where Af = 

(T A Tf)0(Pi- Pb )e(±-p c ) and 
we use auxiliary variables p a , pb, p c to specify each run. 

The next step consists of the averages over the follow- 
ing two quantities, (a) Averaging over the joint prob- 
ability distributions of the local fields of the two par- 
ties, (b) Average over the auxiliary variables, which 
is unique to the case of mutual learning. The nor- 
malized overlap, p, between weight vectors belonging to 
each pair of hidden units is found to obey the equation, 
dp/ da = V [C 2 + (1 - C) 2 }((1 - p)/V2^ - J?C/2)(1 + 
p) - 277(1 - p 2 )C(l - C)/V2^ - v 2 pC(l - C) 2 , where 
C = cos -1 p/tt. For 77 < j] c <~ 2.68 the points p = ±1 
are repulsive fixed points of the above equation, where 
for 77 > rj c a phase transition occurs to a state of full 
synchronization. 

The equation of motion of the overlap of an at- 
tacker with the first party after synchronization, i.e., 



A,B 



-1, 



P 

rj z (l - cos" 1 p A '°/n - p 
equation is p A '° = —p B 



p B '°, is given by dp A '°/da = 
A '°)/2. The fixed point of this 
^0.79 and is independent of 77, 
indicating that perfect learning is not achievable. Ana- 
lytical results derived from the last two equations in the 
case of 77 = 3 are presented in the inset of Fig. || and are 
in good agreement with simulation with N — 5000 and 
20 runs. 
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